\begin{abstract}
We propose efficient algorithms for performing the Kolmogorov-Smirnov test on streaming data. The Kolmogorov-Smirnov test is a non-parametric test for measuring the strength of a hypothesis that some data is drawn from a fixed distribution (one-sample test), or that two sets of data are drawn from the same distribution (two-sample test). Unlike some other tests, Kolmogorov-Smirnov does not assume that the distribution has a known form (e.g., it is normal), and in the two-sample case it need not know anything about the distribution, other than that it is continuous. Motivated by significant increases in the size of datasets available today, we present algorithms for both the one-sample and the two-sample tests for data processed in a stream. 

Our algorithms take advantage of the rich literature on computing quantiles in a stream. We show how the estimation of the statistic for the Kolmogorov-Smirnov test can be done with high accuracy in terms of the error guarantees of these sketches---the error can be made arbitrarily small at the expense of more memory. We also demonstrate the accuracy of our algorithms via extensive experimentation on both real and synthetic datasets. We show that our algorithms are superior to sampling, and that it is feasible to reliably perform the test using our algorithms with a reduction in data of several orders of magnitude.
\end{abstract}

